Parallel meta-blocking for scaling entity resolution over big heterogeneous data
نویسندگان
چکیده
منابع مشابه
Parallel meta-blocking for scaling entity resolution over big heterogeneous data
Entity resolution constitutes a crucial task for many applications, but has an inherently quadratic complexity. In order to enable entity resolution to scale to large volumes of data, blocking is typically employed: it clusters similar entities into (overlapping) blocks so that it suffices to perform comparisons only within each block. To further increase efficiency, Meta-blocking is being used...
متن کاملScaling Entity Resolution to Large, Heterogeneous Data with Enhanced Meta-blocking
Entity Resolution constitutes a quadratic task that typically scales to large entity collections through blocking. The resulting blocks can be restructured by Meta-blocking in order to significantly increase precision at a limited cost in recall. Yet, its processing can be time-consuming, while its precision remains poor for configurations with high recall. In this work, we propose new meta-blo...
متن کاملEntity Resolution in a Big Data Framework
Resource Description Framework (RDF)1 is a data model that can be used to publish semistructured data visualized as directed graphs. An example is Dataset 1 in Fig. 1. Nodes in the graph represent entities and edges represent properties connecting these entities. Two nodes may refer to the same logical entity, despite being syntactically disparate. For example, the entity Mickey Beats in Datase...
متن کاملTop-K Entity Units Retrieval Over Big Data
During the past several years, data size has increased explosively. This data explosion tendency has impacted various fields ranging from biomedical engineering, business consulting to social media and mobile application. Big Data is a two sided sword. While it provides incredibly treasured insights in commercial scope and innovative discovery in the scientific field, Big Data also has many cha...
متن کاملScaling Security for Big, Parallel File Systems
The need for petaand exabyte scale parallel file systems that support high-performance computing (HPC) has been rapidly increasing. These systems have unique demands, different from those of traditional distributed file systems. As a result, securing I/O in big, parallel file systems without significantly impacting performance has proven challenging. Parallel file systems are commonly composed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Systems
سال: 2017
ISSN: 0306-4379
DOI: 10.1016/j.is.2016.12.001